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Let X\iJ. Np{iJ,,VxI) and ^ Np{^,VyI) be independent p- 
dimensional multivariate normal vectors with common unknown mean 
II. Based on only observing X = x, we consider the problem of ob- 
taining a predictive density p(y\x) for Y that is close to p{y\ii) as 
measured by expected KuUback-Leibler loss. A natural procedure 
for this problem is the (formal) Bayes predictive density p\j{jj\x) un- 
der the uniform prior 7ru(/i) = 1, which is best invariant and mini- 
max. We show that any Bayes predictive density will be minimax if 
it is obtained by a prior yielding a marginal that is superharmonic 
or whose square root is superharmonic. This yields wide classes of 
minimax procedures that dominate p\j{y\x), including Bayes predic- 
tive densities under superharmonic priors. Fundamental similarities 
and differences with the parallel theory of estimating a multivariate 
normal mean under quadratic loss are described. 



1. Introduction. Let X\fi Np{fj,,VxI) and Y\f.t ^ Np{fj,,VyI) be inde- 
pendent p-dimensional multivariate normal vectors with common unknown 
mean fi, and let p{x\iJ,) and p{y\fi) denote the conditional densities of X and 
Y. We assume that Vx and Vy are known. 

Based on only observing X = x, we consider the problem of obtaining 
a predictive density p{y\x) for Y that is close to p{y\fi). We measure this 
closeness by KuUback-Leibler (KL) loss, 

(1) LM.\x))= [p{y\^,)\og?^dy, 

J p[y\x) 
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and evaluate f) by its expected loss or risk function 
(2) i2KL(/^,p)= p{x\n)L{fi,p{-\x))dx. 



For the comparison of two procedures, we say that pi dominates p2 if 
-Rkl(a*5Pi) < -Rkl(^5P2) for all fi and with strict inequality for some fi. By a 
sufficiency and transformation reduction, this problem is seen to be equiva- 
lent to estimating the predictive density of Xn+i under KL loss based on ob- 
serving Xi, . . . ,Xn when Xi, . . . i.i.d. ~ A/p(//, E). For distributions 
beyond the normal, versions and approaches for the KL risk prediction prob- 
lem have been developed by Asian [2], Harris [10], Hartigan [11], Komaki 
[12, 14] and Sweeting, Datta and Ghosh [24]. 

For any prior distribution vr on fi, Aitchison [1] showed that the average 
risk r{iT,p) = J Ry:l{^^p)tt{^) is minimized by 

(3) Ptt{v\x) = j p{y\^i)T:{^i\x) d^, 

which we will refer to as a Bayes predictive density. Unless vr is a trivial point 
prior, pT^{y\x) ^ {p{y\fi) : /x G -R^}, that is, pj^ will not correspond to a "plug- 
in" estimate for /i, although under suitable conditions on vr, P7r(?/|a:) — > 
as Vx — > 0. 

For this problem, the best invariant predictive density (with respect to 
the location group) is the Bayes predictive density under the uniform prior 
7ru(/i) = 1, namely 

which has constant risk; see [18] and [19]. More precisely, one might refer 
to pu as a formal Bayes procedure because ttu is improper. Aitchison [1] 
showed that p\j{y\x) dominates the plug-in predictive density p{y\jlMLE) 
which simply substitutes the maximum likelihood estimate /^mle = x for /i. 
As will be seen in Section 2, p\j is minimax for KL loss (1). That pjj is best 
invariant and minimax can also be seen as a special case of the more general 
recent results in Liang and Barron [17], who also show that pu is admissible 
when p = 1 under the same loss. 

However, pu is inadmissible when p>3. Komaki [13] proved that when 
P ^ 3, Pu itself is dominated by the (formal) Bayes predictive density 



(5) Puiy\x) = J p{y\fi)TTu{fi\x) dfi, 
where 

(6) ^h(^) = ||/u||-(p-2) 
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is the (improper) harmonic prior recommended by Stein [21], which we sub- 
script by "H" for harmonic. Although Komaki referred to tth as harmonic, 
his proof did not directly exploit this property. 

More recently, Liang [16] showed that p\j is also dominated by the proper 
Bayes predictive density Paiy\x) under the prior TTa{n) (see [23]) defined 
hierarchically as 

(7) fi\s^Np{0,svoI), s~(l + s)"-^ 

Here vq and a are hyperparameters. The conditions for domination are that 
vq^Vx, and a £ [0.5, 1) when p = 5 and a £ [0, 1) when p > 6. Note that 
TTa depends on the constant vq in (7), a dependence that will be maintained 
throughout this paper. The harmonic prior tth is well known to be the special 
case of TTa when a = 2. 

These results closely parallel some key developments concerning minimax 
estimation of a multivariate normal mean under quadratic loss. Based on 
observing ~ Np{fj,,I), that problem is to estimate /i under 

(8) RQ{fi,fl)=Ef,\\fi- fi\\'^, 

where we have denoted quadratic risk by Rq to distinguish it from the KL 
risk i?KL in (2). Under Rq, fiMLE = -'^^ is best invariant and minimax, and 
is admissible if and only if p < 2. Note that /xmle plays the same role here 
that p\j plays in our KL risk problem. A further connection between /Umle 
and pu is revealed by the fact that /xmle = ET^y{fi\x), the posterior mean of 
H under iT\j{fi) = 1. 

Stein [21] showed that /tn = ET^^{fi\x), the posterior mean under tth, dom- 
inates /tMLE when p > 3, and Strawderman [23] showed that fia = ETr^{ii\x), 
the proper Bayes rule under tTq when Vx = vq = I, dominates fiuLE when 
a € [0.5, 1) for p = 5 and when a £ [0, 1) for p>6. Comparing these results 
to those of Komaki and Liang in the predictive density problem, the parallels 
are striking. A principal purpose of our paper is to draw out these parallels 
in a more unified and transparent way. 

For these and other shrinkage domination results in the quadratic risk 
estimation problem, there exists a unifying theory that focuses on the prop- 
erties of the marginal distribution of X under vr, namely 

(9) m^{x) = j p{x\^)tt{p) dfi. 

The key to this theory is the representation due to Brown [4] that any 
posterior mean of ^, jl-,^ = E.,^[p\x), is of the form 



(10) 



fi^ = X + V\ogm.„{x), 
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where V = {d/dxi, . . . jd/dxp)' . To show that flu dominates fiMLE, Stein 
[21, 22] used this representation to estabhsh that Rqin, Amle) — -RqCa*) fj-w) = 
Ef,U{X), where 

(11) U{X) = ||Vlogm.(X)|p - 2^^^^ 

is an unbiased estimate of the risk reduction of fij^ over /Imle, where V'^mT^{x) = 

Because //mle is minimax, it fohows immediately from (11) that V^m7r(2;) < 
is a sufficient (though not necessary) condition for /Itt to be minimax, and 
as long as m^(x) is not constant, for /i^ to dominate /iMLE- [Recah that 
a function m{x) is superharmonic when V^m(x) < 0.] The fact that /in 
dominates //mle when p > 3 now fohows easily from the fact that noncon- 
stant superharmonic priors [of which the harmonic prior T^nilj) is of course 
a special case] yield superharmonic marginals 771,^ for X. 

It follows from (12) that the weaker condition V^\/ 7/17^(3;) < is suffi- 
cient for /iTT to be minimax, although strict inequality on a set of positive 
Lebesgue measure is then needed to guarantee domination over /imle- Four- 
drinier, Strawderman and Wells [6] showed that the Strawderman priors tt^ 
in (7) yield superharmonic y^m^, so that the minimaxity of the Strawder- 
man estimators is established by (12). In fact, it follows from their results 
that TTa also yields superharmonic ^Jm^ when a G [1,2) and p > 3, thereby 
broadening the class of formal Bayes minimax estimators. 

One major aim of the present paper is to establish an analogous unifying 
theory for the KL risk prediction problem. Paralleling (10), we begin by 
showing how any Bayes predictive density p-,^ can be explicitly represented 
in terms of p\] and the form of the corresponding marginal m-,^. Coupled 
with the heat equation, Brown's representation and Stein's identity, this 
representation is seen to lead to a new identity that links KL risk reduction 
to Stein's unbiased estimate of risk reduction. Based on this link, we ob- 
tain sufficient conditions on m,r for minimaxity and domination of p-j^ over 
p\j. These general conditions subsume the specialized results of Komaki [13] 
and Liang [16] and can be used to obtain wide classes of improved minimax 
Bayes predictive densities including and pa- Furthermore, the underlying 
priors and marginals can be readily adapted to obtain minimax shrinkage 
toward an arbitrary point or subspace, and linear combinations of superhar- 
monic priors and marginals can be constructed to obtain minimax multiple 
shrinkage predictive density analogues of the minimax multiple shrinkage es- 
timators of George [7, 8, 9]. Thus, the parallels between the estimation and 
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the prediction problem are broad, both quahtatively and technically. The 
main contribution of this paper is to establish this interesting connection. 

2. General conditions for minimaxity. In this section we develop and 
prove our main results concerning general conditions under which a Bayes 
predictive density pTr{y\x) in (3) will be minimax and dominate pu{y\x). 
We begin with three lemmas that may also be of independent interest. The 
following general notation will be useful throughout. For Z\fi Np{fi,vl) 
and a prior vr on fi, we denote the marginal distribution of Z by 



In terms of this notation, the marginal distributions of ~ -^p(^) Vxl) and 
y|/i~ Np{fi,VyI) under vr are then mT^{x;Vx) and mT^{y;Vy), respectively. 

Lemma 1. If m.,r{z;vx) is finite for all z , then for every x, pTt{y\x) will 
be a proper probability distribution overy. Furthermore, the mean of pT^{y\x) 
is equal to Et^{ij\x). 

Proof. Both claims follow by integrating (3) with respect to y and 
switching the order of integration using the Fubini-Tonelli theorem. □ 

Lemma 1 is important because, for our decision problem to be meaningful, 
it is necessary for a predictive density to be a proper probability distribution. 
By the laws of probability, a Bayes predictive density p-K{y\x) will be a proper 
probability distribution whenever ■7r(/i) is a proper prior distribution. But 
by Lemma 1, improper '7r(^) can still yield proper p-K{y\x) under a very weak 
condition. 

Our next lemma establishes a key alternative representation of pTt{y\x) 
that makes use of the weighted mean 



Note that W would be a sufficient statistic for /x if both X and Y were 
observed. As X and Y are independent (conditionally on ^u), it follows that 
W\pL ~ Np{ji, v^I) where 



(13) 




(14) 



W = 



VyX + VxY 
Vx+Vy 



Vx+V; 



'V 



The marginal distribution of W is then m-,^{w,Vw)- 



Lemma 2. For any prior ■7r(/i), Pniy\x) can be expressed as 



(15) 
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where p\j{y\x) is defined by (4)- Furthermore, the difference between the KL 
risks of p\]{y\x) and p.„{y\x) is given by 

(16) 

where £^^^„(-) stands for expectation with respect to the N{n,vl) distribution. 



Proof. The joint marginal distribution of X and Y under vr is, 

Pn{x,y) = J p{x\fj.)p{y\fj.)7r{fj.) dfi 

1 f ||x-u"^ 

■ exp 



1 f \\y-f^ 



y 

2 



1 ( \\y — X 

exp 



1 f — ;Up' 



(27r7;^)P/2 

= pu(yk)"i7r(it';^'t«)- 

The representation (15) now follows since Pn{y\x) = Pn{x,y)/mTr{x;Vx). 
To prove (16), the KL risk difference can be expressed as 

RKhifJ',Pv) - RKhifJ',Pn) = / / Pix\n)piy\n)\og / dxdy 

J J pv{y\x) 

p{x\n)p{y\n) log "^^(^'^"') 

where the second equality makes use of (15). The second expression in (16) 
is seen to equal this last expression by the change of variable theorem. □ 

Paralleling Brown's representation (10), representation (15) reveals the 
explicit role played by the marginal distribution of the data under vr. Anal- 
ogous to Bayes estimators E-,^{^\x) of /i that "shrink" /iMLE = x, this rep- 
resentation reveals that Bayes predictive densities p-w{y\x) "shrink" p\]{y\x) 
by a factor m.,^{w;Vw) /m.,^{x]Vx)- However, the nature of the shrinkage by 
PT^{y\x) is different than that by Et^{ij\x). To insure that p-K{y\x) remains a 
proper probability distribution, the factor mT^{w;Vw) /mT^{x]Vx) cannot be 
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strictly less than 1. In contrast to simply shifting /tMLE = 2; toward the mean 
of TT, pT^{y\x) adjusts pv{y\x) to concentrate more on the higher probability 
regions of vr. Figure 1 illustrates such shrinkage oipu{y\x) hy pii{y\x) in (5) 
when Vx = 1, Vy = 0.2 and p = 5. 

For our purposes, the principal benefit of (15) is that it reduces the KL 
risk difference (16) to a simple functional of the marginal ■mT^{z;v). As will 
be seen in the proof of Theorem 1 below, (16) is the key to establishing 
general conditions for the dominance of over py. First, however, we use 
it to facilitate a simple direct proof of the minimaxity of p\j, a result that 
also follows from the more general results of Liang and Barron [17]. 

Corollary 1. The Bayes predictive density under = 1, namely 
pu , is minimax under Rkl ■ 

Proof. By a transformation of variables, x ^ {x — fj,) and y ^ (y — ^u) , it 
is easy to see that -Rkl(/U,pu) = -Rkl(0,Pu) = r for all /x, so that -Rkl(a*)Pu) 
is constant. Next, we show that r is a Bayes risk limit of a sequence of Bayes 
rules p7r„ with 7r„(/i) = Np{0,a'^I), where fi^ ^ 00 as n — > 00. By the fact 
that r(7r„,pu) = and (16), 



It is now easy to check that (17) = 0(1/cj^) and hence goes to zero as n 
goes to infinity. By Theorem 5.18 of [3], the minimaxity of pu follows. □ 

Our next lemma provides a new identity that links E^^i,logmTT{Z;v) to 
Stein's unbiased estimate of risk reduction U{x) in (11) and (12) for the 
quadratic risk estimation problem. When combined with (16) in Theorem 1, 
this identity will be seen to play a key role in establishing sufficient conditions 
on for p-,^ to be minimax and to dominate pjj . 

Lemma 3. If m-,^[z\Vx) is finite for all z, then for any Vyj <v < Vx, 
m^(z;u) is finite. Moreover, 



(17) 




log {X; Vx)] dfi, 



where 




(18) 




(19) 





Fig. 1. Shrinkage ofpu{y\x) to obtain pH{y\x) when Vx = l,Vy — 0.2 andp = 5. Here y = {yi,y2, 0,0,0) ■ 
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Proof. When m.,^[z;vx) is finite for all z, it is easy to check that for 
any fixed z and any Vw <v <Vx-, 

p/2 



,{z]v)<[ — ] mj,{z;vx) <oo. 

\Vuj/ 



Letting Z* = (Z - ~ 7Vp(0,I), we obtain 

d d 

— £'^,^ logm^(Z;u) = — £;iogm^(v^Z* + f ) 

(20) 

^ ^{d/dv)m^{y/vZ* + fj,; v) 



where 

^m^{^/vz* +fi;v) 

■exp<^ ^- )'^[^J')d^i 



dvj {27rv)P/^ "^l 2v 
— m^(z; v)- j p{z\ii )Tr{fi ) dfi . 



Using the fact that 



(21) —m„{z;v) = -Vmj,{z;v), 

ov 2 

which is straightforward to verify, and by Brown's representation ETj{fj,'\z) 
z + vVlogrriT^iz) from (10), 

^ {d/dv)m^{^Z* + n;v) 
^^^^ mT,{y/vZ* + n]v) 

_ fl V^m^jZ-v) , (Z-^)Vlogm^(Z;^ - 
"""^'"U m^{Z;v) + 2v 

Finally, by (2.3) of [22], 

{Z-fi)'Vlogm^{Z;v) 



^^"^ 2v 



(23) 

(24) = \ ( ^'"^rf - II V log {Z- v) I 
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Combining (20), (22) and (24) yields (18). That (18) equals (19) can be 
verified directly. □ 

It may be of independent interest to note that the intermediate step (21) 
is in fact a restatement of the well-known fact that any Gaussian convo- 
lution will solve the homogeneous heat equation, which has a long history 
in science and engineering; for example, see [20]. Brown, DasGupta, Half 
and Strawderman [5] recently used identities derived from the heat equa- 
tion, including one bearing a formal similarity to (21), in other contexts of 
inference and decision theory. Furthermore, as the Associate Editor kindly 
pointed out to us, the proof of Lemma 3 can also be obtained by appealing 
to Theorem 1 and equation (54) of that paper. 

Theorem 1. Suppose m^{z]Vx) is finite for all z. 

(i) IfV'^mT^{z;v) < for allv^ <v<Vx, then pT^{y\x) is minimax under 
Rkl- Furthermore, pT^{y\x) dominates p\j{y\x) unless n = ttu . 

(ii) // V^Y^m^(z; v) < for all Vw < v < Vx, then P7r(y|x) is minimax 
under Rkl- Furthermore, p-K{y\x) dominates p\j{y\x) if for all <v < Vx, 
V'^ y^mT^(z;v) < on a set of positive Lebesgue measure. 

Proof. As established in Corollary 1, pu is minimax under -Rkl- Thus, 
minimaxity is established by showing that (16) is nonnegative, and domi- 
nance is establish by showing that (16) is strictly positive on a set of positive 
Lebesgue measure. Then (i) and (ii) follow from (18), (19) and the fact that 

Vyo<Vx- □ 

Corollary 2. IfmT^iz^Vx) is finite for all z , then pTj{y\x) will be min- 
imax if the prior density vr satisfies V^7r(/u) < a.e. Furthermore, p-n-iylx) 
will dominate p\j{y\x) unless vr = vru- 

Proof. It is straightforward to show (see problem 1.7.16 of [15]) that 
V^m,r(-2;t') < when V^7r(/_f) < a.e. Therefore, Corollary 2 follows imme- 
diately from (i) of Theorem 1. □ 

The above sufficient conditions for minimaxity and domination in the KL 
risk prediction problem are essentially the same as those for minimaxity 
and domination in the quadratic risk estimation problem. What drives this 
connection is revealed by comparing Stein's unbiased estimate of quadratic 
risk reduction in (11) and (12) with (18) and (19). It follows directly from 
this comparison that the risk reduction in the quadratic risk estimation 
problem can be expressed in terms of logm,r as 



(25) -Rq(^,AmLe) - -RQ(^f,/i7r) 



d 

^£';,,„ log m^(Z;-u) 



v=l 
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3. Examples. In this section we show how Theorem 1 and Corollary 2 
can be applied to establish the minimaxity of pn and pa- Compared to the 
minimaxity proofs of Komaki [13] for pn, and of Liang [16] for pa, this 
unified approach is more direct and more general. We further indicate how 
our approach can be used to obtain wide classes of new minimax prediction 
densities. 



Example 1 . Let us return to the Bayes predictive density pn, the special 
case of (3) under the harmonic prior vrH(/ti) in (6). Following Komaki [13], 
the marginal of Z]/i ~ Np{^i,vl) under tth can be expressed as 

(26) mH(z;i;)oc^-(P-2)/20p(llz/V^ll), 

where 4)p{u) = u~p^'^ /o^^^^" gxp(— t) dt is the incomplete Gamma func- 

tion. By Lemma 2, j5h can be expressed in terms of this marginal as 

(27 pMx) = rPv{y\x). 

Because tth is harmonic [V^7rH(/i) = a.e.], and hence superharmonic, for 
p > 3, the fact that pn is minimax and dominates p\] follows immediately 
from Corollary 2. 

Beyond pn, one might consider the class of Bayes predictive densities 
Pt^ corresponding to the (improper) multivariate t priors 7r(;u) = (U/i]]^ + 
2/a2)"(''i+P/^). Because these priors are superharmonic for ai < — 1 and p > 
3, the minimaxity and domination of pu by these rules follows immediately 
from Corollary 2. 



Example 2. Turning next to pa, the marginal of Z\fj, ~ Np(fj,, vl) under 
the Strawderman prior iTa in (7) can be expressed as 



ma(z;v)(x \2tw(—s + 1]\ 
(28) ^ 



-p/2 



Because vth is the special case of tTq when a = 2, it follows that m}i{z; v) is the 
special case of ma{z;v) when a = 2. As Fourdrinier, Strawderman and Wells 
[6] showed, the marginal for any proper prior cannot be superharmonic, so 
that Theorem l(i) cannot hold for pa when a < 1. However, Theorem l(ii) 
does hold for such pa, because y/ma{z;v) is superharmonic for v <vo when 
p = 5 and a G [0.5, 1) or j> > 6 and a G [0, 1). This fact can be obtained using 
h{s) oc (1 + s)"~^ in Theorem 2 below, which extends Theorem 1 of [6]. 
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Theorem 2. For a nonnegative function h{s) over [0, oo), consider the 
scale mixture prior 



(29) TThifj,) = J 7r{p\svo)h{s) ds, 
where 7r(/i|sfo) = Np{0, svqI) . For Z\fi Np{p,vl), let 

(30) mh{z;v)oc H {2ttv{s + 1)}"^/^ exp(- ^^^^^^^ \rh{rs) ds 

Jo I 2(s + 1) J 

be the marginal distribution of Z under vr/j(/i), where r = v/vq. Let h be a 
positive function such that: 

(i) —{s + l)h'{s)/h{s) can be decomposed as li{s) + l2{s), where h < A 
is nondecreasing while <l2 < B with ^A + B < [p — 2)/A, 

(ii) lim™/i(s)/(s + l)P/2 = 0. 



Then \Jmh{z] v) in (30) is superharmonic for all v <vq, and when <vq, 
the Bayes predictive density ph{y\x) under Trh{fj,) in (29) is minimax. 



Proof. The proof of Theorem 1 in [6] shows that \Jmh{z] vq) in (30) 
is superharmonic when vo = l, and it is straightforward to show that this is 
true for general vq. From this fact, \Jmh(z\ v) will be superharmonic for all 
V <Vo hr{s) := rh{rs) satisfies (i) and (ii) when r G (0, 1]. 

First we show that h^ satisfies (i). By the assumptions on h, we have 
— {s + l)h'{s)/h{s) decomposed as h{s) + l2{s)- Then 

-{s + l)^ = -'-^^{rs+l)^ 
hr{s) rs+1 h{rs) 

r(s + 1) , . ~ , 

= 77Ti [^i(*) + ^2(s)]. 

Choose li to be li multiplied by r(s + l)/(rs + 1). They can be checked to 
satisfy the conditions since the factor (rs + r)/{rs + 1) is a nondecreasing 
function of s and less than or equal to 1 when < r < 1. To see that hr 
satisfies (ii), note that 

hr{s) _ h{rs) / rs + i y/^ 
(s + l)p/2 ~ {rs + l)P/'^^\ s + l) 

goes to zero when s — > oo since the first term goes to zero by the assumption 
on h. 

Thus yjmh{z] v) will be superharmonic for all v <vq. When Vx ^vq, the 
minimaxity ol ph{y\x) then follows from (ii) of Theorem 1. □ 
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Going far beyond these results, Theorem 2 can be used to obtain wide 
classes of proper priors that yield minimax Bayes predictive densities ph- 
Following the development in Section 4 of [6], such can be obtained with 
particular classes of shifted inverted gamma priors and classes of generalized 
i-priors. 

4. Further extensions. Priors such as vth and TVa are concentrated around 
0, so that the risk reduction offered by and pa will be most pronounced 
when /i is close to 0. However, such priors can be readily recentered around 
a different point to obtain predictive estimators that obtain risk reduction 
around the new point. Because the superharmonicity of rri-Tr and ^/rrl^ will be 
unaffected under such recentering, the minimaxity and domination results of 
Theorems 1 and 2 will be maintained. Minimax shrinkage toward a subspace 
can be similarly obtained by recentering such priors around the projection 
of 1^1 onto the subspace. 

To vastly enlarge the region of improved performance, one can go further 
and construct analogues of the minimax multiple shrinkage estimators of 
George [7, 8, 9] that adaptively shrink toward more than one point or sub- 
space. Such estimators can be obtained using mixture priors that are convex 
combinations of recentered superharmonic priors at the desired targets. Be- 
cause convex combinations of superharmonic functions are superharmonic. 
Corollary 2 shows that such priors will lead to minimax multiple shrinkage 
predictive estimators. 
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